class: center, middle, inverse, title-slide .title[ # Statistical Concepts Everyone Should Know ] .subtitle[ ##
Statistics for Life
] .author[ ###
John
for
The John & Calvin Podcast
] --- class: inverse, center, large <h1 style="font-size: 100px; margin-top: 50px; font-family: Georgia, serif;">I</h1> <h1 style="font-size: 80px; margin-top: 75px;">Neutral Metrics</h1> <hr style="margin-top: 2em; margin-bottom: 2em;"> -- <h2 style="font-size: 50px;">Misuse and Misunderstandings Mislead</h2> -- <h2 style="font-size: 40px;">mean ≠median</h2> --- ## 1. Neutral Metrics: Mean vs Median ### Local bar weath distribution: 50 Patrons
--- ## 1. Neutral Metrics: Mean vs Median ### Local bar weath distribution: 50 Patrons + Billon Gezosberg
--- ## 1. Neutral Metrics: Mean vs Median <div style="font-size: 130%; text-align: left; margin: 100px 0;"> <p><strong>Mean</strong>: Sum of all values divided by the number of values.</p> <br> <p><strong>Median</strong>: Middle value when the values are ordered.</p> </div> --- ## 1. Neutral Metrics: Mean vs Median ### Impact of a $150B Outlier on Mean vs. Median — Across Sample Sizes <table class="table" style="font-size: 24px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> Sample Size </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> Median </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 101 </td> <td style="text-align:right;padding-left: 20px;"> $1,485,214,194 </td> <td style="text-align:right;padding-left: 20px;"> $61,587 </td> </tr> <tr> <td style="text-align:right;"> 1,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 149,915,313 </td> <td style="text-align:right;padding-left: 20px;"> $60,141 </td> </tr> <tr> <td style="text-align:right;"> 10,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 15,063,285 </td> <td style="text-align:right;padding-left: 20px;"> $59,610 </td> </tr> <tr> <td style="text-align:right;"> 100,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 1,564,864 </td> <td style="text-align:right;padding-left: 20px;"> $59,897 </td> </tr> <tr> <td style="text-align:right;"> 1,000,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 214,846 </td> <td style="text-align:right;padding-left: 20px;"> $59,868 </td> </tr> <tr> <td style="text-align:right;"> 10,000,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 79,864 </td> <td style="text-align:right;padding-left: 20px;"> $59,885 </td> </tr> <tr> <td style="text-align:right;"> 100,000,001 </td> <td style="text-align:right;padding-left: 20px;"> $ 66,365 </td> <td style="text-align:right;padding-left: 20px;"> $59,876 </td> </tr> </tbody> </table> --- ## 1. Neutral Metrics: Mean vs Median A more realistic example  --- .pull-left[ ### Economics & Wealth - GDP per capita - household income - net worth - home price - monthly rent - CEO compensation ### Health & Healthcare - life expectancy - healthcare spending per person - hospital bill - patient out-of-pocket cost ] .pull-right[ ### Academics - test score (SAT, PISA) - GPA - academic citations - speaking invitations per expert ### Other - commute time - screen-time per user - household energy use - carbon emissions per capita - YouTube ad revenue per channel - revenue per app in the App Store - Software Bug Fix Times ] --- ## 1. Neutral Metrics: Mean vs Median #### Gross Domestic Product - GDP `GDP = C + I + G + (X−M)` **C** is consumer spending, **I** is business investment, **G** is government spending, and (**X−M**) is net exports ([Investopedia](https://www.investopedia.com/articles/investing/051415/how-calculate-gdp-country.asp)). GDP per capita = GDP ÷ population = **average** output per person .pull-left[ <center> U.S. GDP per Capita<br> <img src="assets/gdp_per_capita.png" style="width:100%;"><br> <span style="font-size: 50%;"> <a href="https://www.statista.com/statistics/263601/gross-domestic-product-gdp-per-capita-in-the-united-states/">statista</a> </span> </center> ] --- ## 1. Neutral Metrics: Mean vs Median #### Gross Domestic Product - GDP `GDP = C + I + G + (X−M)` **C** is consumer spending, **I** is business investment, **G** is government spending, and (**X−M**) is net exports ([Investopedia](https://www.investopedia.com/articles/investing/051415/how-calculate-gdp-country.asp)). GDP per capita = GDP ÷ population = **average** output per person .pull-left[ <center> U.S. GDP per Capita<br> <img src="assets/gdp_per_capita.png" style="width:100%;"><br> <span style="font-size: 50%;"> <a href="https://www.statista.com/statistics/263601/gross-domestic-product-gdp-per-capita-in-the-united-states/">statista</a> </span> </center> ] .pull-right[ <center> <img src="assets/personal_wealth_usa.png" style="width:100%;"><br> <span style="font-size: 50%;"> By <a href="//commons.wikimedia.org/wiki/User:RCraig09" title="User:RCraig09">RCraig09</a> - <span>Own work</span>, <a href="https://creativecommons.org/licenses/by-sa/4.0" title="Creative Commons Attribution-Share Alike 4.0">CC BY-SA 4.0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=137146870">Link</a> </span> </center> ] --- ## 1. Neutral Metrics: Mean vs Median ### Pareto Distribution
--- ## 1. Neutral Metrics: Mean vs Median ### Pareto Distribution (x-axis limited)
--- ## 1. Neutral Metrics: Mean vs Median ### Normal Distribution
--- ## 1. Neutral Metrics <span style="font-size: 32px;"> A metric is just a calculated number. </span> <br> <span style="font-size: 32px;"> It can be misused, but only if its not understood. <br> <br> <br> </span> -- <span style="font-size: 32px;"> To understand any metric, you must know: </span> <ul style="font-size: 26px; line-height: 1.4;"> <li>how it's calculated</li> <li>the assumptions it relies on</li> <li>the <b>distribution</b> of the data</li> </ul> --- class: inverse, center, large <h1 style="font-size: 100px; margin-top: 50px; font-family: Georgia, serif;">II</h1> <h1 style="font-size: 80px; margin-top: 75px;">Distributions</h1> <hr style="margin-top: 2em; margin-bottom: 2em;"> -- <h2 style="font-size: 55px;">What's Typical, What's Rare, What's Extreme</h2> <h2 style="font-size: 40px;">Theoretically</h2> --- ## 2. Distributions ### Theoretical models of where values are likely to occur - **Symmetry vs. Skewness** Are values evenly spread around the center, or do they stretch more in one direction? - **Variability (Spread)** How tightly or widely are the values clustered? - **Tail Behavior** How likely are extreme or outlier values? --- ## 2. Distributions ### Theoretical models of where values are likely to occur
--- ## 2. Distributions ### Theoretical models of where values are likely to occur
--- ## 2. Distributions ### Theoretical models of where values are likely to occur
---  ---  --- ## 2. Distributions $$ f(x) = \frac{1}{\sigma \sqrt{2\pi}} \, e^{ -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 } $$ Where: - `\(\mu\)` = the **mean** (center) - `\(\sigma\)` = the **standard deviation** (spread) - `\(x\)` = the value at which we evaluate the density - `\(f(x)\)` = the probability density at `\(x\)` given `\(\mu\)` and `\(\sigma\)` --- .center[ **Normal distribution** ] $$ f(x) = \frac{1}{\sigma \sqrt{2\pi}} \, e^{ -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 } $$ <hr style="border: 0; height: 1px; background: lightgray;"> <div style="text-align: center;"> <img src="assets/normal_annotated.png"> </div> --- ## 2. Distributions <span style="font-size: 32px;"> A distribution defines what's typical, what's rare, and what's extreme in a dataset. </span> <br> -- <span style="font-size: 32px;"> Identifying the distribution helps you understand: </span> <ul style="font-size: 26px; line-height: 1.4;"> <li>what outcomes are likely and unlikely</li> <li>how much variation or uncertainty exists</li> <li>whether rare or extreme events fall within expectations</li> <li>if deviations from expectations signal a need for more investigation</li> </ul> -- <br> <span style="font-size: 26px;"> Distributions describe single variables. What about relationships between variables? </span> --- class: inverse, center, large <h1 style="font-size: 100px; margin-top: 50px; font-family: Georgia, serif;">III</h1> <h1 style="font-size: 80px; margin-top: 75px;">Correlation, Confounding, Causation</h1> <hr style="margin-top: 2em; margin-bottom: 2em;"> -- <h2 style="font-size: 80px;">and the Stories We Tell</h2> --- class: inverse, center, large, middle <h1 style="font-size: 80px; ">Correlation</h1> ---  ---  --- class: inverse, center, large, middle <h1 style="font-size: 80px; ">Confounding</h1> --- .pull-left[
- Model: `reading ~ shoe size` - Model p-value (shoe): < 0.001 ] --- .pull-left[
- Model: `reading ~ shoe size` - Model p-value (shoe): < 0.001 ] .pull-right[
- Model: `reading ~ age` - Model p-value (age): < 0.001 ] --- .pull-left[
- Model: `reading ~ shoe size` - Model p-value (shoe): < 0.001 ] .pull-right[
- Model: `reading ~ age` - Model p-value (age): < 0.001 ] <hr style="border: 0; height: 1px; background: lightgray;"> .center[ Combined Model: `reading ~ shoe + age` Shoe p-value: 0.314; Age p-value: **< 0.001** > **Result:** Once age is included, shoe size is no longer significant <br> Shoe size is a **confounding** variable. ] --- class: inverse, center, large, middle <h1 style="font-size: 80px; ">Causation</h1> --- ### Causation: How do we test if A causes B? <hr style="border: 0; height: 1px; background: lightgray;"> -- .pull-left[ <div style="margin-top: 4px;"> <b>Randomized Controlled Trial</b> <ul style="font-size: 80%;"> <li><b>Research Question:</b> Does melatonin increase sleep duration?</li> <br> <li><b>Randomized:</b> Assign subjects by chance to treatment or control group, balancing other factors</li> <br> <li><b>Controlled:</b> Control group used to compare outcomes against the treatment group</li> <br> <li><b>Trial:</b> Apply the intervention and observe outcomes. Compare groups.</li> </ul> </div> ] .pull-right[
] --- ### Causation: How do we test if A causes B? <hr style="border: 0; height: 1px; background: lightgray;"> .pull-left[ <div style="margin-top: 4px;"> <b>Randomized Controlled Trial</b> <ul style="font-size: 80%;"> <li><b>Research Question:</b> Does melatonin increase sleep duration?</li> <br> <li><b>Randomized:</b> Assign subjects by chance to treatment or control group, balancing other factors</li> <br> <li><b>Controlled:</b> Control group used to compare outcomes against the treatment group</li> <br> <li><b>Trial:</b> Apply the intervention and observe outcomes. Compare groups.</li> </ul> </div> ] .pull-right[
] <br> > Experiments test causality by: > randomly assigning subjects to treatment or control groups, > applying the intervention, > and comparing outcomes. --- class: inverse, center, large, middle <h1 style="font-size: 80px; ">The Stories We Tell</h1> ---
---
<small style="font-size:12px; line-height:1.1; display:inline-block;"> "Figure 2 correlates saturated fat and total vegetable oil consumption versus heart disease deaths in the U.S.A., with data on all three dating back to at least 1909."</small> <small style="font-size:12px; line-height:1; display:inline-block;"> Data: <a href="https://www.fns.usda.gov/cnpp/us-foodsupply/nutrient-content-1909-2010">USDA Food Supply</a> | Paper: <a href="https://www.sciencedirect.com/science/article/abs/pii/S0306987717305017?via%3Dihub">ScienceDirect</a> | Video: <a href="https://www.youtube.com/watch?v=7kGnfXXIKZM">YouTube</a> </small> --- ## 3. Correlation, Confounding, Causation <span style="font-size: 28px;">Not all patterns point to cause-and-effect. </span> <br> -- <span style="font-size: 28px;"> To understand relationships between variables, consider: </span> <ul style="font-size: 22px; line-height: 1.4;"> <li>Correlation could mean nothing, or it could mean a lot.</li> <li>It could even confuse or obscure real relationships.</li> <li>Causality can be tested (<b>carefully</b>).</li> <li>Are we just building stories we want to tell?</li> </ul> <br> -- <span style="font-size: 26px;"> If variables are related, what pattern does that relationship follow? </span> --- class: inverse, center, large <h1 style="font-size: 100px; margin-top: 50px; font-family: Georgia, serif;">IV</h1> <h1 style="font-size: 80px; margin-top: 75px;">Shapes of Change</h1> <hr style="margin-top: 2em; margin-bottom: 2em;"> -- <h2 style="font-size: 60px;">What Curve, What Point?</h2> --- ## 4. Shapes of Change
--- ## 4. Shapes of Change
--- ## 4. Shapes of Change
--- ## 4. Shapes of Change
--- ## 4. Shapes of Change <span style="font-size: 28px;">Depending on context, changes in inputs to outputs can be very different.</span> <br> -- <span style="font-size: 28px;">Recognizing the shape of change helps you understand: </span> <ul style="font-size: 22px; line-height: 1.4;"> <li>whether progress will be steady or accelerating</li> <li>when gains may slow down or speed up unexpectedly</li> <li>how small inputs can sometimes create outsized effects, or hit limits</li> </ul> -- <span style="font-size: 24px;">Early in the process, it’s hard to tell which curve you're on. Misjudging it can lead to frustration or unrealistic expectations.</span> -- <br> <span style="font-size: 24px;">Our expectations shape how we interpret new information.</span> --- class: inverse, center, large <h1 style="font-size: 100px; margin-top: 50px; font-family: Georgia, serif;">V</h1> <h1 style="font-size: 80px; margin-top: 75px;">Bias</h1> <hr style="margin-top: 2em; margin-bottom: 2em;"> -- <h2 style="font-size: 60px;">Expectations and Reality</h2> --- ## 5. Bias <span style="font-size: 20px;">Statistical bias is the **systematic difference** between the expected value of an estimator and the true value of the parameter being estimated.</span> <br> -- <span style="font-size: 20px;">It's the **consistent gap** between what your method tends to estimate and the actual truth, not just random error.</span> <br> -- <span style="font-size: 20px;">Bias means you’re consistently off, because of how you **measure, think, or sample** — and you don’t realize it.</span> -- <div style="display: flex; justify-content: center;">
</div> --- <div style="display: flex; justify-content: center;">
</div> --- <div style="display: flex; justify-content: center;">
</div> $$ \text{Bias} = \mathbb{E}[\hat{\theta}] - \theta $$ .pull-left[ **Biased estimator** $$ \text{Bias} = \mathbb{E}[\hat{\theta}] - \theta $$ Bias = "large" ] .pull-right[ **Unbiased estimator** $$ \text{Bias} = \mathbb{E}[\hat{\theta}] - \theta $$ Bias = "none" ] --- ## 5. Bias ### Types of Bias -- <div style="display: flex; font-size: 18px; gap: 20px;"> <ul style="list-style-type: disc; flex: 1;"> <li><b>Sampling bias</b>: Sample isn't representative of the population</li> <li><b>Selection bias</b>: Systematic exclusion or inclusion skews results</li> <li><b>Omitted variable bias</b>: Missing important factors in analysis</li> <li><b>Measurement bias</b>: Data collection errors or inaccuracies</li> <li><b>Confirmation bias</b>: Seeking or interpreting data to confirm beliefs</li> </ul> <ul style="list-style-type: disc; flex: 1;"> <li><b>Recall bias</b>: Faulty memories skew reported data</li> <li><b>Observer bias</b>: Expectations influence how observations are recorded or interpreted</li> <li><b>Survivorship bias</b>: Ignoring those who dropped out or failed</li> <li><b>Publication bias</b>: Only "positive" results get published</li> </ul> </div> -- All these forms share the same principle: they systematically push results away from truth. --- ## 5. Bias The Self-Reinforcing Feedback Loop of Bias
--- ### The Bias Escalation Curve: Conceptual Edition Consequence of The Self-Reinforcing Feedback Loop of Bias
--- ### The Bias Escalation Curve: Carnivore Edition Consequence of the Self-Reinforcing Feedback Loop of Bias
--- ### The Bias Escalation Curve: Vegan Edition Consequence of the Self-Reinforcing Feedback Loop of Bias
--- ## 5. Bias <span style="font-size: 24px;">A statistical estimate can systematically deviate from the true value, just like our beliefs. We call this bias.</span> <br> <span style="font-size: 24px;">By understanding the nature of bias we learn: </span> <ul style="font-size: 20px; line-height: 1.2;"> <li>how it can shape attention, interpretation, and memory</li> <li>why we favor evidence that confirms what we already believe</li> <li>how bias can be self-reinforcing, leading to extreme positions over time</li> <li>how to better recognize our own biases</li> </ul> -- <span style="font-size: 22px;">Bias shapes how we see the world, gradually reinforcing itself and altering our relationship to reality.</span> -- <br> <span style="font-size: 22px;">From time to time ask yourself:</span> >Am I on the Bias Escalation Curve? If so, where am I on it? --- <h1 style="margin-bottom: 0em;">Neutral Metrics</h1> <h4 style="margin-top: 0; font-weight: bold;">Misuse and Misunderstandings Mislead</h2> <h1 style="margin-bottom: 0em;">Distributions</h1> <h4 style="margin-top: 0; font-weight: bold;">What's Typical, What's Rare, What's Extreme - Theoretically</h2> <h1 style="margin-bottom: 0em;">Correlation, Confounding, Causation</h1> <h4 style="margin-top: 0; font-weight: bold;">and the Stories We Tell</h2> <h1 style="margin-bottom: 0em;">Shapes of Change</h1> <h4 style="margin-top: 0; font-weight: bold;">What Curve, What Point</h2> <h1 style="margin-bottom: 0em;">Bias</h1> <h4 style="margin-top: 0; font-weight: bold;">Expectations and Reality</h2>